--- layout: post title: Geospatial Analysis in R excerpt: This tutorial demonstrates the mapping functionality of R - as presented by MAJ Dave Beskow at the 1st DSCOE Event. --- Geospatial Analysis in R

Introduction: The power to “roll your own” maps

In 2005 a group of R developers created the R package sp to extend R with classes and methods for spatial data (Pebesma and Bivand, 2005). Since then, hundreds of packages have been created to assist in analyzing and visualizing spatial data. Given the myriad of GIS software that already exists, created by ESRI as well as many other companies, what is the advantage of conducting geospatial analysis in R? Here’s a comparison of GIS and R provided by Robert Hijmans (UC Davis):

GIS R
Visual interaction Data & model focused
Data management Analysis
Geometric operations Attributes as important
Standard workflows Creativity & innovation
Single map production Many (simpler) maps
Click, click, click, & click Repeatability (single script)
Speed of execution Speed of development
Typically expensive Powerfull and free

This short class is designed to introduce geospatial analysis in R. To do this we will focus on the building blocks as well as some packages that facilitate easy analysis and visualization. The primary topics that I will focus on are:

  1. Point Data
  2. Polygon Data
  3. Understanding and using datums
  4. Geo-Coding
  5. Introducing the ggmap package
  6. Advanced Visualization Topics

If you need to conduct a quick review of R, I recommend that you look at two introductory lessons that we’ve developed for West Point cadets:

Before you begin conducting spatial analysis in R, make sure that you install three necessary packages: ggmap, Rcpp, and sp. This can be done in the packages tab in the lower right quadrant of RStudio. It can also be done on the command line by typing install.packages('ggmap') , install.packages('Rcpp'), and install.packages('sp')

Point Data

Point data is the simplest type of geospatial data. Spatial point data is used represent the spatial nature of events. Examples of point data include the location of a customer’s iPhone purchases in business, the location of a crime in law enforcement, the location of attacks in the military, or the location of infrastructure in engineering. Point data means that an entity is represented by an x coordinate and a y coordinate in space. In data, this is most often done by recording a longitude and latitude. We are going to import a data set that you should already be familiar with: the Houston Crime data set. You can import this data with the command:

library(ggmap)
library(Rcpp)
data(crime)  ##After running this commend, you shoud see 'crime' in your working environment.

Now let’s explore the data. Let’s first see what the column names are:

names(crime)
##  [1] "time"     "date"     "hour"     "premise"  "offense"  "beat"    
##  [7] "block"    "street"   "type"     "suffix"   "number"   "month"   
## [13] "day"      "location" "address"  "lon"      "lat"

We see that we have some time fields, type of offense fields, as well as some spatial fields. In addition to the street address, we actually have the longitude and the latitude. We can get a little more information on the columns with the command:

str(crime)
## 'data.frame':    86314 obs. of  17 variables:
##  $ time    : POSIXt, format: "2010-01-01 01:00:00" "2010-01-01 01:00:00" ...
##  $ date    : chr  "1/1/2010" "1/1/2010" "1/1/2010" "1/1/2010" ...
##  $ hour    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ premise : chr  "18A" "13R" "20R" "20R" ...
##  $ offense : Factor w/ 7 levels "aggravated assault",..: 4 6 1 1 1 3 3 3 3 3 ...
##  $ beat    : chr  "15E30" "13D10" "16E20" "2A30" ...
##  $ block   : chr  "9600-9699" "4700-4799" "5000-5099" "1000-1099" ...
##  $ street  : chr  "marlive" "telephone" "wickview" "ashland" ...
##  $ type    : chr  "ln" "rd" "ln" "st" ...
##  $ suffix  : chr  "-" "-" "-" "-" ...
##  $ number  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ month   : Ord.factor w/ 8 levels "january"<"february"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ day     : Ord.factor w/ 7 levels "monday"<"tuesday"<..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ location: chr  "apartment parking lot" "road / street / sidewalk" "residence / house" "residence / house" ...
##  $ address : chr  "9650 marlive ln" "4750 telephone rd" "5050 wickview ln" "1050 ashland st" ...
##  $ lon     : num  -95.4 -95.3 -95.5 -95.4 -95.4 ...
##  $ lat     : num  29.7 29.7 29.6 29.8 29.7 ...

Now that we’ve learned a little bit about the data, let’s explore the spatial component of our data. To do this, I usually start by plotting it against a plane white background as simple x and y coordinates. To do this in R, we simply type:

plot(crime$lon,crime$lat,xlab="lon",ylab="lat")

This plot shows us immediately that we have some outliers that seem to be pretty far away from Houston, Texas.

Now let’s showcase the power of the ggmap package by plotting this same data on an image from Google Maps. First, let’s learn how to get a map from Google Maps. The following code gets and plots a map of Houston from Google Maps:

qmap("houston", zoom = 13)  #This command retrieves and plots a map of Houston

Note that if we change the zoom to 6, we would get a map of most of Texas centered on Houston:

qmap("houston", zoom = 6) #Change zoom

Now, in order to add points to this plot, we need need to store this image in an object and add to it. Plotting with the ggmap package is done with “layers.” You first plot the map and then you can add layers of other geospatial objects and/or text. We do this with the code below:

HoustonMap<-qmap("houston", zoom = 14)  #First, get a map of Houston

HoustonMap+                                        ##This plots the Houston Map
geom_point(aes(x = lon, y = lat), data = crime)    ##This adds the points to it

Note here that we assign the column name lon to the x variable, the column name lat as the y variable, and we tell R to get these field names from the crime data frame. If we wanted to color the points by the type offense, we could do this with the command:

HoustonMap+
geom_point(aes(x = lon, y = lat, colour = offense),
              data = crime)

Note here that offense is the name of the column that contains the categorical variable that we want to use to distinguish between the color of our points.

HoustonMap+
geom_point(aes(x = lon, y = lat, colour = offense),
              data = crime)+
ggtitle("Map of Crime in Houston")

Also note that we can easily save this image as a file by wrapping our script with two commands. These commands allow us to print our graphic to a file in our working directory instead of printing it to the screen. Here I’ve given a method to print to a .png graphics file:

png("myHoustonMap.png")  ##This is the name of the image file (in this case a .png file)
HoustonMap+
geom_point(aes(x = lon, y = lat, colour = offense),
              data = crime)+
ggtitle("Map of Crime in Houston")
dev.off()  ##This command indicates that we're done creating our plot.  It finalizes and closes the .png file.

Geocoding

There are many times when you are given data that has a location field, but it doesn’t have a column for latitude and longitude. You may have a field for address, city, or other text describing the location, but you don’t have the latitude and longitude. In these cases, the ggmap package provides a way for you to geocode these locations. The geocode function works by pinging Google Maps as if you were searching it on a web browser, and returning the lat and lon for a given location. For example, we can write:

geocode("West Point, NY")
##         lon      lat
## 1 -73.95597 41.39148

and note that this returns the latitude and longitude for West Point, NY. You can also enter a full address. Here we’ll enter the address for the White House:

geocode('1600 Pennsylvania Avenue Northwest, Washington, DC 20500')
##         lon      lat
## 1 -77.03648 38.89768

Try entering your home address. You can use this to get the lat and lon information so that you can plot your spatial data. For example, let’s say that you wanted to create a map of your squad. You could create a squad data frame that looks like this:

mySquad<-data.frame(
  Names=c("Bill","Dave","Liz","Cody","Austin","Amber","Jimmy","Mike","Amanda"),
  Location=c("Boston, MA","Chicago, IL","Atlanta, GA","Portland, OR","Albuquerque, NM",
              "Dallas, TX","St Louis, MO","Miami, FL","San Francisco, CA"),stringsAsFactors=FALSE)

Here’s what our data looks like.

mySquad
##    Names          Location
## 1   Bill        Boston, MA
## 2   Dave       Chicago, IL
## 3    Liz       Atlanta, GA
## 4   Cody      Portland, OR
## 5 Austin   Albuquerque, NM
## 6  Amber        Dallas, TX
## 7  Jimmy      St Louis, MO
## 8   Mike         Miami, FL
## 9 Amanda San Francisco, CA

Let’s add a lat and lon column to this data. We can do this by geocoding in a for loop like this:

for(i in 1:9){
  temp<-geocode(mySquad$Location[i]) #Geocode each location and store it in temp
  mySquad$lon[i]<-temp$lon           #Assign the lon
  mySquad$lat[i]<-temp$lat           #Assign the lat
}

Note that our data now has a spatial aspect that we can use for analysis:

mySquad
##    Names          Location        lon      lat
## 1   Bill        Boston, MA  -71.05888 42.36008
## 2   Dave       Chicago, IL  -87.62980 41.87811
## 3    Liz       Atlanta, GA  -84.38798 33.74900
## 4   Cody      Portland, OR -122.67648 45.52306
## 5 Austin   Albuquerque, NM -106.60555 35.08533
## 6  Amber        Dallas, TX  -96.79699 32.77666
## 7  Jimmy      St Louis, MO  -90.19940 38.62700
## 8   Mike         Miami, FL  -80.19179 25.76168
## 9 Amanda San Francisco, CA -122.41942 37.77493

We can now plot our squad

US<-qmap("United States",zoom=4,color="bw")

US+
geom_point(aes(x = lon, y = lat),color="red",data = mySquad)+
  annotate('text', x=mySquad$lon, y=mySquad$lat+1, label = mySquad$Names, colour = I('red'), size = 4)+
  ggtitle("My Squad")

Lesson 2: Working with Shape files in R

Introduction to Shapefiles in R

Shapefiles are a popular method of representing geospatial vector data in GIS software. This file format was developed by ESRI (a notable GIS Software Company). The shapefile format can store data in the form of points, lines, or polygons. Shapefiles are used to represent adminsistrative boundaries (think country/state/county borders), roads, rivers, lakes, etc.

The raster package in R makes it very easy to download various administrative boundaries for any country. You just have to know the three letter ISO Code for the country of interest (you can look this up on Google). Below we demonstrate how to load the raster library, and get the province level administrative boundaries for France.

library(raster)                                  ##Load the Raster Library
france<-getData('GADM', country='FRA', level=1)  ##Get the Province Shapefile for France
plot(france)                                     ##Plot this shapefile

Note that level 1 detail give the boundaries one level below the national boundaries. Actually, level 0 gives only national boundaries, level 1 gives the equivalent of province level boundaries, and each susequent level gives finer and finer granularity. In the United States, level 0 give our national boundary, level 1 gives state boundaries, and level 2 adds county boundaries. You can see level 0 through level 3 for France in the following plot.

alt text

Below we will retrieve and plot the county level shapefile for the United States. Note that when we plot this, that the continental United States is extremely small since it also plots Alaska, Hawaii, and some other islands that belong to the US.

us<-getData('GADM', country='USA', level=2)  #Get the County Shapefile for the US
plot(us)

alt text

Now let’s explore this data a little further. If we check the class of the us object (by typing class(us)), we will see that it is a SpatialPolygonsDataFrame. This means that it is multiple polygons that are held together in a data frame. We can access each of these polygons in a similar way that we access elements of a normal data frame. Now let’s look at the names of this data frame:

names(us)
##  [1] "PID"       "ID_0"      "ISO"       "NAME_0"    "ID_1"     
##  [6] "NAME_1"    "ID_2"      "NAME_2"    "NL_NAME_2" "VARNAME_2"
## [11] "TYPE_2"    "ENGTYPE_2"

Notice there are various types of names in this data frame. Name 0 represents national boundaries, Name 1 represents state names, and Name 2 represents county names. We can see this further if we plot all of the unique NAME_1:

unique(us$NAME_1)
##  [1] "Georgia"              "Alabama"              "Alaska"              
##  [4] "Arizona"              "Arkansas"             "California"          
##  [7] "Colorado"             "Connecticut"          "Delaware"            
## [10] "District of Columbia" "Florida"              "Hawaii"              
## [13] "Idaho"                "Illinois"             "Indiana"             
## [16] "Iowa"                 "Kansas"               "Kentucky"            
## [19] "Louisiana"            "Maine"                "Maryland"            
## [22] "Massachusetts"        "Michigan"             "South Dakota"        
## [25] "Tennessee"            "Texas"                "West Virginia"       
## [28] "Minnesota"            "Mississippi"          "Missouri"            
## [31] "Montana"              "Nebraska"             "Nevada"              
## [34] "New Hampshire"        "New Jersey"           "New Mexico"          
## [37] "New York"             "North Carolina"       "North Dakota"        
## [40] "Ohio"                 "Oklahoma"             "Oregon"              
## [43] "Pennsylvania"         "Rhode Island"         "South Carolina"      
## [46] "Utah"                 "Vermont"              "Virginia"            
## [49] "Washington"           "Wisconsin"            "Wyoming"

Below we demonstrate how to subset a SpatialPolygonsDataFrame. Note here that if we wanted to only plot the Northwestern United States, we could use the following code to create a subset that only contains Washington, Oregon, and Idaho.

 northwest<-subset(us,NAME_1=="Washington" | NAME_1=="Oregon" | NAME_1=="Idaho")
 plot(northwest)

Now let’s create a subset that only contains Texas.

texas<-subset(us,NAME_1=="Texas")
plot(texas)

Now we’ll demonstrate how to plot point data on top of a shapefile. If you start by plotting the shapefile, you only have to call the points command in order to plot the point data on top of the shapefile.

library(ggmap)   ##load the ggmap package so we can access the crime data
data(crime)      ##load the crime data
plot(texas)      ##plot texas
points(crime$lon,crime$lat,col="red",pch=16)  ##Add the crime data to the map

Here we can see clearly that some of our points are a long way from Houston.

Choropleth Maps (Heat Maps)

Another great way to show spatial data is with a choropleth map (sometimes called a heat map). A choropleth map shades a SpatialPolygonDataFrame by some measurement of spatial data. One very useful way to shade a map is by the density of spatial points found in each polygon. For example, we could shade the map of Texas with the density of crime data in the crime data set. Before we do this, let me first give you a mapping function that I created for this purpose back in 2011. I am not going to explain all of the computations that are performed in this code, but you will need to copy and paste this code into your console in order for this function to be available to you (you can also source this from an R script file if that is easier). You will need to load the sp and RColorBrewer packages before using this code.

heatMap <-function(data,shape=NULL,col="blue",main="Sample HeatMap"){
  # Plots a Heat Map of a Polygons Data Frame.  This will 
  # demonstrate density within a finite set of polygons
    #
    # Args:
    #   data:   Spatial Points dataframe
    #   shape:  Polygons Data Frame 
    #
    #
    #   Notes:  This function requires the sp and RColorBrewer
    #           Packages
    #
    #   Beskow: 03/28/11   
    #
    is.installed <- function(mypkg) is.element(mypkg, 
                installed.packages()[,1])
    if (is.installed(mypkg="sp")==FALSE)  {
        stop("sp package is not installed")}
    if (is.installed(mypkg="RColorBrewer")==FALSE)  {
        stop("RColorBrewer package is not installed")}
    if (!class(data)=="SpatialPointsDataFrame")  {
        stop("data argument is not SpatialPointsDataFrame")}
require(sp)
require(RColorBrewer)
freq_table<-data.frame(tabulate(over(as(data,"SpatialPoints"),
    as(shape,"SpatialPolygons")),nbins=length(shape)))
names(freq_table)<-"counts"

shape1<-spChFIDs(shape,as.character(1:length(shape)))
row.names(as(shape1,"data.frame"))
spdf<-SpatialPolygonsDataFrame(shape1, freq_table, match.ID = TRUE)

rw.colors<-colorRampPalette(c("white",col))
spplot(spdf,scales = list(draw = TRUE),
        col.regions=rw.colors(max(freq_table)), main=main)
}

Now that we have this function read into our Global Environment, we can use it. Before we use it, we need to create an SpatialPointsDataFrame with our crime data. This is done below (after loading the sp and RColorBrewer libraries).

library(sp)
library(RColorBrewer)
## Warning: package 'RColorBrewer' was built under R version 3.1.3
crime.sp<-crime     ##create a new object that we will coerce to a SpatialPointsDataFrame
crime.sp<-crime.sp[!is.na(crime.sp$lat),]   ##Get rid of all NA's in the location field
coordinates(crime.sp)<-c("lon","lat")       ##Assigning coordinates coerces this to a SpatialPointsDataFrame
proj4string(crime.sp)<-proj4string(texas)   ##Assigns the texas spatial datum to our new data frame

We can check our new data set by running the code:

class(crime.sp)  #Prints the class of an object
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

This should show us that crime.sp is a SpatialPointsDataFrame. Now that we’ve checked this, we just have to run a simple command that will create our choropleth.

heatMap(crime.sp,texas,col="red",main="Houston Crime Data Denstiy")

This map isn’t very interesting, mainly because the data is primarily centered on a single county. A more interesting plot would be to plot the protests in France following the Charlie Hebdo shooting in January, 2015.

heatMap(protests.sp,france,col="blue", main="Protests in France, 15JAN15-31JAN15")

Note here that most of the protests are located away from Paris. Also note that Southwest France has see more than 1600 protests over these 15 days.

Lesson 3: Working with Heatmaps and Plots over Polygons

In this lesson we are going to learn how to create a contour heatmap of spatial point data as well as create pie plots positioned on the centroid of a polygon.

Contour Heat Maps with ggmap

A contoured heat map is another way to demonstrate data density, especially if you do not have or do not desire to create a heatmap over a shapefile (like the country’s provinces). The contour heatmap uses two-dimentsional kernel density estimation in order to create the contoured heatmap of the spatial point data. Think of this as a histogram in two dimensions

Below we demonstrate how to do this using the crime data

library(ggmap)   ##load the ggmap package so we can access the crime data
data(crime)      ##load the crime data

We will plot the contour map over a ggmap:

houston <- get_map(location = "houston", zoom = 13) ##Get the houston map
houstonMap<-ggmap(houston, extent = "device")       ##Prepare Map

houstonMap +
  stat_density2d(aes(x = lon, y = lat, fill = ..level..,alpha=..level..), bins = 10, geom = "polygon", data = crime) +
  scale_fill_gradient(low = "black", high = "red")+
  ggtitle("Map of Crime Density in Houston")

Once you do this, play with the number of bins to see how this variable will change the look of the contours. Try 5, 10, and 15. You can also experiment with changing the color of the heat map.

Pieplots over Polygons

Often times we want to understand the spatial nature of a categorical variable. One way to do this is to position multiple plots over their respective geographical area. To demonstrate this, we will load an unclassified data set about attacks in Afghanistan. This comes from the Worldwide Incidents Tracking System (WITS database) which is maintained by the National Counterterrorism Center. This data is available to download from CIS as a csv file.

Copy and paste the function below into your console (or source it from a file). This will load the Pie over Polygon procedure.

plotPiePoly <-function(data,shape,factor,main="Pie Plot",
           size=c(1,1),legend=TRUE,legend.pos="bottomright"){
    # Plots a Pie Plot over over Polygons in a SpatialPolygonDataFrame
    #
    # Args:
    #   data:       Spatial Points dataframe
    #   shape:          Shapefile  
    #   factor.column:  The column in the SpatialPointsDataFrame
    #   main:           the title of the graph
    #   size:           The size of the pieplot in inches
    #
    #   Notes:  This function requires the sp package.
    #
    #   Beskow: 03/30/11   
    #
    is.installed <- function(mypkg) is.element(mypkg, 
                                               installed.packages()[,1])
    if (is.installed(mypkg="sp")==FALSE)  {
      stop("sp package is not installed")}
    if (is.installed(mypkg="TeachingDemos")==FALSE)  {
      stop("TeachingDemos package is not installed")}
    if (!class(data)=="SpatialPointsDataFrame")  {
      stop("data argument is not SpatialPointsDataFrame")}
    require(sp)
    require(TeachingDemos)
    names(data)[grep(factor,names(data))]<-"factor"
    data.table<-table(data$factor)
    data.table<-sort(data.table[data.table>0],decreasing=TRUE)
    my.names<-names(data.table)
    palette(rainbow(length(my.names)))
    plot(shape)
    if(legend){
      legend(legend.pos, my.names, col=c(2:(length(my.names)+1)),
             text.col = "green4", pch = c(15), bg = 'gray90',cex=0.8)}
    for (i in 1:length(shape)){
      x<-!is.na(over(as(data,"SpatialPoints"),as(shape[i,],
                                                    "SpatialPolygons")))
      if(sum(x)>0){
        my.table<-table(as.factor(data$factor[x]))
        my.table<-my.table[my.table>0]
        table.Colors<-match(names(my.table),my.names)+1
        subplot(pie(my.table,col=table.Colors,labels=NA),
                coordinates(shape[i,])[1],
                coordinates(shape[i,])[2],size=size)
      }
    }
    palette("default")      
    mtext(main,cex=2)
  }

Now that you have the function in your working environment, you can load the relevant packages, read in the WITS data, and create your plot. Note that this plot looks better if you “zoom” in on it.

##Load Relevant Packages
library(sp)
library(TeachingDemos)
## Warning: package 'TeachingDemos' was built under R version 3.1.3
library(raster)

##Get the Afghanistan Shapefile
afg<-getData('GADM', country='AFG', level=1)  #Get the Province Shapefile for France

##Load the WITS data from CSV
wits_sp<-read.csv("wits_sp.csv",as.is=TRUE)

##Create the SpatialPointsDataFrame
coordinates(wits_sp)<-c("lon","lat")
proj4string(wits_sp)<-proj4string(afg)

##Now all you have to do is plot it!
plotPiePoly(data = wits_sp, 
            shape=afg, 
            factor = "flag", 
            size = c(0.5, 0.5), 
            legend = TRUE,
            main = "Sample Pieplot Over Polygon (WITS Data)")

This allows you to visualize the spatial nature of a categorical variable.

In this lesson we are going to learn how to create a contour heatmap of spatial point data as well as create pie plots positioned on the centroid of a polygon.

Contour Heat Maps with ggmap

A contoured heat map is another way to demonstrate data density, especially if you do not have or do not desire to create a heatmap over a shapefile (like the country’s provinces). The contour heatmap uses two-dimentsional kernel density estimation in order to create the contoured heatmap of the spatial point data. Think of this as a histogram in two dimensions

Below we demonstrate how to do this using the crime data

library(ggmap)   ##load the ggmap package so we can access the crime data
data(crime)      ##load the crime data

We will plot the contour map over a ggmap:

houston <- get_map(location = "houston", zoom = 13) ##Get the houston map
houstonMap<-ggmap(houston, extent = "device")       ##Prepare Map

houstonMap +
  stat_density2d(aes(x = lon, y = lat, fill = ..level..,alpha=..level..), bins = 10, geom = "polygon", data = crime) +
  scale_fill_gradient(low = "black", high = "red")+
  ggtitle("Map of Crime Density in Houston")

Once you do this, play with the number of bins to see how this variable will change the look of the contours. Try 5, 10, and 15. You can also experiment with changing the color of the heat map.

Pieplots over Polygons

Often times we want to understand the spatial nature of a categorical variable. One way to do this is to position multiple plots over their respective geographical area. To demonstrate this, we will load an unclassified data set about attacks in Afghanistan. This comes from the Worldwide Incidents Tracking System (WITS database) which is maintained by the National Counterterrorism Center. This data is available to download from CIS as a csv file.

Copy and paste the function below into your console (or source it from a file). This will load the Pie over Polygon procedure.

plotPiePoly <-function(data,shape,factor,main="Pie Plot",
           size=c(1,1),legend=TRUE,legend.pos="bottomright"){
    # Plots a Pie Plot over over Polygons in a SpatialPolygonDataFrame
    #
    # Args:
    #   data:       Spatial Points dataframe
    #   shape:          Shapefile  
    #   factor.column:  The column in the SpatialPointsDataFrame
    #   main:           the title of the graph
    #   size:           The size of the pieplot in inches
    #
    #   Notes:  This function requires the sp package.
    #
    #   Beskow: 03/30/11   
    #
    is.installed <- function(mypkg) is.element(mypkg, 
                                               installed.packages()[,1])
    if (is.installed(mypkg="sp")==FALSE)  {
      stop("sp package is not installed")}
    if (is.installed(mypkg="TeachingDemos")==FALSE)  {
      stop("TeachingDemos package is not installed")}
    if (!class(data)=="SpatialPointsDataFrame")  {
      stop("data argument is not SpatialPointsDataFrame")}
    require(sp)
    require(TeachingDemos)
    names(data)[grep(factor,names(data))]<-"factor"
    data.table<-table(data$factor)
    data.table<-sort(data.table[data.table>0],decreasing=TRUE)
    my.names<-names(data.table)
    palette(rainbow(length(my.names)))
    plot(shape)
    if(legend){
      legend(legend.pos, my.names, col=c(2:(length(my.names)+1)),
             text.col = "green4", pch = c(15), bg = 'gray90',cex=0.8)}
    for (i in 1:length(shape)){
      x<-!is.na(over(as(data,"SpatialPoints"),as(shape[i,],
                                                    "SpatialPolygons")))
      if(sum(x)>0){
        my.table<-table(as.factor(data$factor[x]))
        my.table<-my.table[my.table>0]
        table.Colors<-match(names(my.table),my.names)+1
        subplot(pie(my.table,col=table.Colors,labels=NA),
                coordinates(shape[i,])[1],
                coordinates(shape[i,])[2],size=size)
      }
    }
    palette("default")      
    mtext(main,cex=2)
  }

Now that you have the function in your working environment, you can load the relevant packages, read in the WITS data, and create your plot. Note that this plot looks better if you “zoom” in on it.

##Load Relevant Packages
library(sp)
library(TeachingDemos)
library(raster)

##Get the Afghanistan Shapefile
afg<-getData('GADM', country='AFG', level=1)  #Get the Province Shapefile for France

##Load the WITS data from CSV
wits_sp<-read.csv("wits_sp.csv",as.is=TRUE)

##Create the SpatialPointsDataFrame
coordinates(wits_sp)<-c("lon","lat")
proj4string(wits_sp)<-proj4string(afg)

##Now all you have to do is plot it!
plotPiePoly(data = wits_sp, 
            shape=afg, 
            factor = "flag", 
            size = c(0.5, 0.5), 
            legend = TRUE,
            main = "Sample Pieplot Over Polygon (WITS Data)")

This allows you to visualize the spatial nature of a categorical variable.

Lesson 4: Exporting a KML File from R

Exporting and using KML Files

Google Earth is an extremely versatile tool for understanding geospatial data. Google Earth uses KML files to represent geospatial data. R will allow us to write our geospatial data to a KML file. To do this, you will have to install the maptools and the rgdal libraries. The code for writing a SpatialPointsDataFrame to a KML file is given below:

library(ggmap)
library(maptools)
library(rgdal)

data(crime)                                 #Load the Crime Data Set
crime<-crime[!is.na(crime$lat),]            #Get rid of the NA's in the crime data set
crime<-crime[1:100,]                        #Get only the first 100 records

coordinates(crime)<-c("lon","lat")          #Build a SpatialPointsData Frame
 proj4string(crime)<-CRS("+proj=longlat +datum=WGS84")

##The two lines below convert the month and day columns to character dat
##(both of these line are originally 'factor' data, which is not compatible)
crime$month<-as.character(crime$month)
crime$day<-as.character(crime$day)

##The one line of code below writes our SpatialPointsDataFrame to a KML File
writeOGR(crime, dsn="crime.kml", layer= "crime", driver="KML")

Now we just have to go to our working directory and double click on the KML file to open it in Google Earth (assuming that Google Earth is already installed on your computer)

alt text